Non-Markovian Optimal Prediction 

Alexandre J. Chorin, Ole H. Hald 
Department of Mathematics 
Lawrence Berkeley National Laboratory 
1 Cyclotron Road 
Berkeley CA 94720 

Raz Kupferman 
Institute of Mathematics 
The Hebrew University 
Jerusalem, Israel 



Abstract 

Optimal prediction methods compensate for a lack of resolution 
in the numerical solution of complex problems through the use of 
prior statistical information. We know from previous work that in 
the presence of strong underresolution a good approximation needs 
a non-Markovian "memory", determined by an equation for the "or- 
thogonal", i.e., unresolved, dynamics. We present a simple approxi- 
mation of the orthogonal dynamics, which involves an ansatz and a 
Monte-Carlo evaluation of autocorrelations. The analysis provides a 
new understanding of the fluctuation-dissipation formulas of statisti- 
cal physics. An example is given. 



1 Introduction 

Many problems in science and engineering are described by nonlinear differ- 
ential equations whose solutions are too complicated to be properly resolved. 
The problem of predicting the evolution of systems that are not well resolved 
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has been addressed by the present authors and others in |T|, ^ [3], f|, [5], ^| [7], 
O [13], [T5|]. Nothing can be predicted without some knowledge about the un- 
resolved ( "subgrid" ) degrees of freedom. In the optimal prediction methods 
just cited it is assumed that one possesses, as one often does, prior statisti- 
cal information about the system in the form of an invariant measure; what 
is sought is a mean solution with respect to this prior measure, compatible 
with the information initially at hand as well as with the limitations on the 
computing power one can bring to bear. 

The simplest version of this idea, Markovian optimal prediction, generates 
an approximating system of ordinary differential equations and works well for 
a time that depends on the degree of underresolution and on the uncertainty 
in the data. This version is optimal in the class of Markovian approximations 



[ 13fl , but it eventually exhibits errors, because the influence of partial initial 
data on the distribution of the solutions weakens in time if the system is 
ergodic, and this loss of information is not captured in full, see || [F|. To 
obtain an accurate approximation of a subset of variables without solving 
the full problem requires the addition of a "memory" term, and the resulting 
prediction scheme becomes a generalized Langevin equation, similar to those 
in irreversible statistical mechanics ||, |TU , [Tl], |T7] . 



We present a general formalism for separating resolved and unresolved de- 
grees of freedom, analogous to the nonlinear projection formalism of Zwanzig 
PI but using the language of probability theory. We find a zero-th order 
approximate solution of the equation for the orthogonal unresolved dynamics 
and find its statistics by Monte-Carlo integration; we use the results to con- 
struct a prediction scheme with memory. We apply the scheme to a simple 
model problem. In the conclusion we indicate how the construction is gener- 
alized to more complicated problems, and the new perspectives it opens for 
prediction as well as for irreversible statistical mechanics. 
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2 Linear and nonlinear projections of dynam- 
ical systems 



Consider a problem of the form 

dip 

~dt ~ 



(2.1) 



where R and (p are n-dimensional vectors (n may be infinite), with compo- 
nents Ri and (pf, t is time. When n is finite (|2.1|) is a system of ordinary 
differential equations. Our goal is to calculate the average values of m com- 
ponents of ip, m < n, without calculating all the components; the average 
is over all the values that the missing, unresolved, components may assume; 
our prior information allows us to make statistical statements about these 
missing components. 

We denote the phase space (the vector space in which (p resides) by T; in 
classical statistical physics this phase space is the n = 6£ dimensional space 
of coordinates and momenta (qi,Pi), where i is the number of particles; the 
qi,Pi at time t are then entries of the vector (p. A solution of equation ( |2.1[ ) is 
defined when an initial value ip(t = 0) = x is given; to each initial condition 
x corresponds a trajectory, (p(t) = tp(x,t); the initial value x is emphasized 
by this notation in view of its key role in what follows. 

A phase variable u is a function on T; u may be a vector, whose compo- 
nents are labeled as itj. A phase variable varies when its argument varies in 
time, so that a phase variable whose value at t = was u(x) acquires at time 
t the value u(ip(x,t)). It is useful to examine the evolution of u in a more 
abstract setting: Introduce an evolution operator S f for phase variables by 
the relation 




(2.2) 



Differentiation of (p.2|) with respect to time yields 



-(^ M )(x) = J2M^x,t)) — ^(x,t)) = LS'uix) 



(2.3) 
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where L, the Liouvillian, is the linear differential operator L = £\ Ri[cp)-^-. 
Thus the phase variable S t u can be calculated in either of two ways: (i) for 
each x integrate to time t the equations of motion 4f<f(t) = R(ip(t)) with 
initial conditions (p(0) = x and evaluate the phase variable u at the point 
ip(x,t); or (ii) solve the equation 



It is convenient to write S* = e tL ; we do not inquire here as to the conditions 
under which this symbolic notation can be taken literally. The significant 
thing about equation (|2.3| ) is that it is linear. 

One can check from the definitions that e tL (uv) = (e tL u)(e tL v), e tL f(u) = 
f(e tL u) for any function /, and e tL L = Le tL . In this notation, the symbol 
u standing alone refers to the data u{x) at t = 0; the time dependence is 
described by the exponential. 

Suppose that the initial data x are drawn from a probability distribution 
fjP; each initial datum gives rise to a solution of equation ( |2.1|) and the 
measure fj, evolves into a measure // at time t. The evolution of /x* is 
defined by the conditions 

/ u((p(x,t)) fM°(dx) — / u(x) jJ(dx) 



for all sufficiently smooth phase variables u. We assume that the measure 



invariant measures, in particular, any Hamiltonian system with Hamiltonian 
H leaves invariant the canonical measure with density Z~ 1 e~ H ( x ^ T , where Z 
is a normalization constant and T is the variance of the samples, which in 
physics is the temperature. Given a phase variable u, we denote by E[it] the 
expected value of u with respect to the invariant measure /i, 




fjP is invariant under the flow Q2.1p : fx 1 = \i 



o _ 



ji. Many systems have 



E[u] 
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We endow the space of phase variables with the inner product (it, v) = E[uv], 
which makes them elements of the Hilbert space L 2 [T,fi] (L 2 for brevity). 
The representations ( j2.1|) and ( j2.3| ) of the dynamics are equivalent; in par- 
ticular one can retrieve ( |2.1| ) from ( |2.3| ) by setting u = Xi and therefore 
e tL u = (fi(x,t) (the i-th coordinate function). In L 2 the Liouvillian operator 
L is antisymmetric. 

We wish to calculate the means of a small number of variables (fii(x, t),i = 
1, . . . ,m, m < n, averaged over all the values the other variables may take 
initially when they are drawn from the invariant distribution, without calcu- 
lating any of these other variables. We write (p = (tpx, . . . , (p m ) for the vector 
whose entries are the variables we actually calculate. To find equations for 
(p we need projections of L 2 on subspaces of functions of x — (xi, . . . ,x m ) 
or of (fi(x,t) = . . . , <fm)'i these projections are not unique and we shall 
consider two different ones: 

(i) Consider the conditional expectation E[w|u], where both u and v are 
phase variables; it satisfies: 

1. E[t»|w] is a function of u; 

2. K[v \u] is linear in v. 

E[avi + j3v 2 \u] =aE[v 1 \u] + /3E[v 2 \u]. 



3. E[u|m] is the best approximation of v by a function of u: 

E[\v - E[v\u}\ 2 ] < E[\v - f(u)\ 2 } (2.4) 

for all functions /. 
See Chung ||. 

E[v \u] is the orthogonal projection of v on the space of functions of u, 
and we can write Pv = E[v\u]. This is the "nonlinear projection" used in 



18] with a different interpretation, as well as in 
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ii)For a function w = w(x) in L 2 , define 

p'w = sr-r^U, (2-5) 

this is the "linear projection" used in most of irreversible statistical mechanics 
(see e.g. (§0,|). 

Clearly the projection based on conditional expectations puts more in- 
formation into the subspace of functions of the smaller set of variables and 
is thus preferable for our purposes, but we shall need both projections for 
technical reasons. There is a large set of projections intermediate between 
these two, obtained by spanning the space of functions of x by additional 
elements; we shall not discuss these intermediate projections here. 

We now follow the Mori-Zwanzig procedure (|| [11], [16], [L7]]) and split the 
time derivative of e tL u into its projection on the functions of (pit) plus a 
complement: 

d 

—e tL u = e tL Lu = e tL PLu + e tL QLu, (2.6) 
at 

where P is either of the two projection just described and Q = I — P. If u 
is a scalar function of ip, it is projected in L 2 on a function of 0; if u is a 
vector function of <p, each of its components is projected. The first term in 
( |2.6j ) can be evaluated in terms of u and we denote it by d\{u): 

e tL m(u) = D\(e tL u). 

To understand the second term, consider an evolution operator, e*^ L , 
defined for a phase variable v by the equation: 

^e tQL v = QLe tQL v = Le tQL v - PLe tQL v. (2.7) 

with Pv(t = 0) = 0. This is the orthogonal dynamics equation, which should 
be read as follows: Let w = v at t = 0; let w(t) be a time-dependent phase 
variable that satisfies 

^w(t) = Lw(t) - PLw(t); (2.8) 
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with initial datum w(t = 0) = v, and set w(t) = e tQL v, denning the orthog- 
onal evolution operator e t( ^ L . Note that if v is orthogonal to the range of P 
then so is e t( ^ L v for all times t. The operator e*^ L is the solution operator of 
the orthogonal dynamics. 

The evolution operators e tL and e l ® L satisfy the Dyson formula [[J: 

e tL = e tQL + f e (t-s)LpLe sQL ds, (2.9) 
Jo 

which can be checked by differentiation. With the help of the Dyson formula 



the second term on the right hand side of (|2.6|) can be written as: 

JQL. 



e 



'^v + [ e {t - s)L PLe sQL vds, 
Jo 



where 

v = Lu — PLu = QLu. 

Putting all the terms together, we obtain the generalized Langevin equa- 
tion 



^-e tL u = m{e tL u) + f e {t - s)L PLe sQL vds + e tQL v. (2.10) 



This is an identity between phase variables, the starting point for our approx- 
imations. If one projects this equation on the space spanned by the initial 
data for u, the last term drops out and the other terms acquire a projection 
operator as prefactor. 

The various terms in equation ( p.!0[ ) have conventional interpretations. 



The first term on the right-hand side is a function only of e tL u and represents 
the self-interaction of the components of e tL u; it is the Markovian contribu- 
tion to ^e tL u. The second term depends on u through the values of e tL u at 
all times s between and t, and embodies a non-Markovian memory. Finally, 
the third term has no component in the range of P and if we know nothing 
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outside this range it can be viewed as random, with statistics determined by 
the initial data. 

In the special case of a linear projection P' the second term in equation 
( 2.1(J| ) acquires a particularly simple form, well-known in statistical physics 
where it is the only form in general use. By definition, see above, 

in 

P'Le sQL QLu = P'LQe sQL QLu = ^(LQe sQL QL M , h^h^x), (2.11) 

1 

where hj(x) = Xj/\\xj\\, \\xj\\ 2 = E[£j], 1 < j < m, are normalized coordi- 
nate functions. A short manipulation converts this sum into 

-J2Kj(s)h,(x), (2.12) 

where Kj(s) = (e sQL QLu,QLhj), i.e., Kj is the correlation of the solution 
of the orthogonal dynamics equation that starts from v = QLu with QLhj. 
Substitution into the integral produces the term 

i 

K j(t - s)e sL h j (x)ds. (2.13) 

3 Implementation of the projections in a small 
dimensional space and first-order optimal 
prediction 

The generalized Langevin equation ( |2.1QQ expresses the rate of change of the 
phase variable u as a sum of terms that depend on u and on the orthogo- 
nal dynamics. These expressions cannot be used directly for approximation 
when the equations are nonlinear. Indeed, the evaluation of a term such as 
e tL E[Lu\u] = E[Le tL u\e tL u] requires the evaluation of e tL u, i.e., requires a 
solution of the full set of n equations. In terms of (p, this term becomes 

E[R(y)\y]^ {Xtt) , (3.1) 




8 



where the last expression is the expected value of R(y) given the value <p(x, t) 
of y. ( we first find the best approximation of R(y) by a function of y, then 
substitute the value y = <p(x, t) into the result). The vector x has components 
x not included in x and not known; x = (x, x). To obtain something that can 
be evaluated we move the expectation into the condition, i.e., we approximate 
^[R\y\a=<p(x,x,t)] by ^[R\y\i)=M[<p(x,t)\£l, which is a function y = y(x) of x. 

Assume for a moment that the second and third terms in the Langevin 
equation (|2.10|) are negligible. Equation fl2.10|) becomes: 



-0(x,t) =E[R{y)\y]^ {x>t) . (3.2) 
The approximation we just made, replacing <p(x,t) by its projection, gives: 

^0(x,t) =E[R(y)\y]$ =m ; (3.3) 
projecting this equation on the space of functions of x, we find, 

j t y = nR{y)\y]y=m> (3-4) 

The initial data y(0) = x make y(t) a function of x. These are the first- 
order optimal prediction equations. First-order optimal prediction neglects 
however the effect of the fluctuations due to the variation of the components x 
of x, as we already know from [0 ; all that is left the uncomputed components 
is their conditional expectations. 

As an example, consider the model equations that arise from the Hamil- 
tonian 

H = 

(xi, x 3 can be viewed as position variables qx, q 2 and x 2 , x 4 can be viewed as 
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momenta px,P2). The resulting equations of motion are: 

d 2 



d 

dt 

d 2 



with y?(0) = x. The probability density of x is Z~ 1 e~ H ^ > ' T , and that of (p 
is Z~ 1 e~ H ( tp >' T . For simplicity set T = 1. Keeping m = 2 out of the n = 4 
equations, and omitting all but the first term in the Langevin equation ( |2.10|) , 
we find: 

d (p 2 

dt 1 + ^2 

d 

dt^ = ^ 

as can be deduced from the definition of conditional expectation: 

f-^ x 2 xle~ H dx 3 dx 4 x 2 



e- H dx 3 dx, 1 + xl 



(3.6) 



These are the first-order optimal prediction equations as presented in 0. [J. 

4 Zero-th order solution of the Langevin equa- 
tion 

Now consider the full Langevin equation ( |2.10| ). The evaluation of the mem- 
ory term requires a solution of the orthogonal dynamics equation, which we 
shall present in detail elsewhere. In the present paper we shall be content 
with less, yet the result is instructive. 
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With the appropriate substitutions, the Dyson formula (|2.9|) yields the 
identity 

e tQL = e tL_ f e {ts)L pLe sQL d ^ ^ 

Jo 

which can be in principle be the starting point of a perturbative evaluation 
of e t( ^ L . The zero-th order approximation of this equation is: 

e tQL ^ e tL^ £ 4 _ 2 ) 

i.e., one replaces the true flow in the orthogonal complement of the range of 
P (or P') by the real flow induced by L ( which of course has a component 
in the orthogonal complement). This approximation can be sufficient if the 
correlations in ( |2.13| ) do not decay too slowly; to see this, consider the second 
term in equation ( |2.10| ): 



f e {t - s)L PLe sQL QLuds = [ e {t ~ s)L PLQe sQL QLuds. 
Jo Jo 



(4.3) 



Adding and subtracting equal quantities, we find: 

PLe sQL QLu = PLQe sL u + PLQ(e sQL - e sL )QLu; (4.4) 
a Taylor series yields: 

e sQL _ e sL = I + s q L + 1-sL = -sPL + 0(s 2 ), (4.5) 

and therefore, using QP = 0, we find: 

PLe sQL QLu = PLQe sL QLu + 0(s 2 ), 

[ e {t - s)L PLe sL QLuds = [ e {t ~ s)L PLe sQL QLuds + 0(f)- 
Jo Jo 

if the correlations (e s ® L QLu,QLhj), (e sL QLu,QLhj), are significant only 
over short times, we have an acceptable approximation. 
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At t = the orthogonal dynamics start from v(0) = R(x) — K[R(x)\x]; at 
later time, in our approximation, v(t) = e tL v(0) = R((p(x,t))—K[R(y)\y]j} = $r x ,t)- 
The correlations in ( [2.13 ) can be calculated by Monte-Carlo, and provides a 
better approximation than first-order optimal prediction. This is still not a 
very good approximation, and the reason, as we shall show elsewhere, is that 
the linear projection P' used to derive Q2.13|) projects onto a set of functions 
that fails to span the whole space of functions of x; this can be remedied 
by devising more elaborate approximations to the conditional expectation 
P. Here we propose a heuristic alternative: We interpret E,[R(y)\y]^ = ^ x ^ 
in v(t) as the mean of the right-hand-side of equations ( |2.1|) and R(ip(x,t)) 
as the fluctuation around that mean. This mean varies less than the fluctu- 
ations as x spans T, and we make it vary even less by anchoring it to the 
initial data for the specific problem we wish to solve, i.e., first approximate 
®-[R(y)\y\y=<p(x,t) by ^[R{y)\y]y=E[<p(x,t)\x] and then further fix x at the specific 
initial value for which we want to solve the m— equation approximation of the 



system (|2.1| ). Let w(t) be the function w(t) = ^[R(y)\y]y=E[<p( x ,t)\x] obtained 
in this way for a specific x (in the example, w(t) = — ip 2 — V9 2 /(1 + </?|), where 
ip2 is the specific solution we are seeking). Note that v(t) = R(<p(x, £)) —w(t) 
is no longer a stationary stochastic process. The heuristic rationale for this 
"freezing" of w is as follows: Write as before x = (x, x); when, in the averag- 
ing that determines the kernels Kj, we fix x and vary x alone, thus finding 
the contribution of the subspace orthogonal to x, the "frozen" and the true 
variations of v(t) are the same; however, if we keep x fixed and vary x for the 
evaluation of R(<p) we relate the fluctuations to the specific solution we are 
seeking more thoroughly than is possible otherwise when the projection is P' . 
With these approximations, the kernels Kj in (|2.13| ), Kj(t — s) = K[v(t)v(s)}, 
become 

Kj(t -s) = K°(t -s)- Wj(t) Wj ( S ). (4.6) 

where K® = W,[Rj((p(x,t)Rj(<p(x, s)] can be evaluated by Monte-Carlo: Sam- 
ple initial data from the invariant distribution over and over, solve the result- 
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ing system of equations, whose solutions are statistically stationary in time, 
and average. Note that this calculation does not depend on the particular 
known initial values x, and thus can be done once and for all for a given 
equation and a given level of truncation. The time evolution of the Vj(t) 
need be accurate only as long as Vj(t) and Vj(0) are correlated. 

Substitute this into the Langevin equation: there is no impediment to 
using the true e t( ^ L in the third term; then average with respect to x; note 
that the Kj are functions of time only and commute with a projection on the 
initial data, while the average of the third term is 0. We find the following 
equation: 



d f l m 

Vj = w (t) ~ / - s ) - (4-7) 

Jo 1 



dt 



where j — 1, . . . , m and y(x, 0) = x. 

Equation Q4.7|) states that the solution of the problem depends not only 
on the mean of the R but also on the autocorrelation of the fluctuations. The 
solution is thus "renormalized" ( see e.g [14|). The sources of error in ( f4.7|) 
are: An inconsistent use of projections, a simplistic solution of the orthogonal 
dynamics, and possibly an inaccurate Monte-Carlo evaluation of the kernels 
K°. All can be remedied. 

In the example, ( |4.7|) reduces to: 

j t Vi = <"(«)- J (K (t-s)-w{t)w{s))y l (s)ds, (4.8) 

!» = -Vu (4.9) 

The kernel K° is K° = E[Rx(if(x, t))Ri(p(x, s))}, R x (<p) = -(p 2 - (p 2 (p% 
w(t) = — %)2 — IJ2/ (1 + an< i expected value in K° is evaluated over all 
choices of x in n = 4 dimensions drawn from the invariant distribution with 
density Z~ l e~ H / 7 '. 

In Figure 1 we display some numerical results for yi(t) in the example, 
with data xi = l,x 2 = 0. We show the "truth" found by averaging many 
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solutions of the full 4— dimensional system, the Galerkin approximation that 
sets all unknown functions to zero, the first-order optimal prediction, and the 
solution of equation ( [4.9| ). The solutions are shown in a rather favorable case 
( they look less striking if one exchanges for example the values of xi, Xq). 

5 Conclusions 

At first sight, the results in Figure 1 are interesting but not overwhelming: 
The cost of evaluating K° is comparable to the cost of evaluating the "truth" 
by Monte-Carlo, and the gain is not obvious. One may note however, that 
once K° has been evaluated, the cost of rerunning the calculation with any 
other initial data x is negligible. 

This is the significant fact: K° is evaluated "at equilibrium", i.e., with all 
components of x, the initial data, sampled from the invariant measure; 
does not depend on any specific initial value. As we shall show elsewhere, 
the analogous statement is true for an accurate solution of the orthogonal 
dynamics. Once the heavy work of determining memory functions has been 
done, the solution of a specific problem is plain sailing. At equilibrium, one 
can bring to bear the panoply of scaling methods and equilibrium statistical 
mechanics. One can say that the Mori-Zwanzig formalism makes possible the 
use of "universal" (non problem specific) results to solve specific problems. 
In some of the applications we have in mind, the "large" (n-dimensional) 
problems are partial differential equations, and then imperfections in the 
evaluation of memory terms are immaterial as long as the rate of convergence 
of finite-dimensional approximations is enhanced ( see Jlj, [|, |7|]). 

It is taken for granted in the physics literature that the memory kernels 
are autocorrelations of the "noise" (i.e., the orthogonal dynamics). This is 
true also in the example we have presented here. However, it should be 
obvious from the discussion that this is an artifact of the use of P', the linear 
projection, and that the full truth is more complicated and interesting. 
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Furthermore, in the physics literature one usually deals with memory 
by separating "fast" and "slow" variables and assuming that the orthogonal 
dynamics generated by "fast" variables generate "noise" with delta-function 
memory ; note in contrast that in our problem the unresolved and the resolved 
variables have exactly the same time scale. 

Finally, the heuristic arguments of the present paper will be replaced in 
general by a systematic approximation of the Langevin equation, including 
a systematic evaluation of the orthogonal dynamics, as we will explain in 
future publications. 
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